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Abstract 

We consider a fundamental algorithmic question in spectral graph theory: Compute a spectral 
sparsifier of a random-walk matrix-polynomial 

d 

L a (G) = D - ^a r D • (D -1 A) r 

r =1 

where A is the adjacency matrix of a weighted, undirected graph, D is the diagonal matrix of 
weighted degrees, and a = (oq, ad) are nonnegative coefficients with Ylr=i a r = 1- Recall 
that D X A is the transition matrix of random walks on the graph. In its linear form (when d = 

1), the matrix polynomial becomes D — A, which is a Laplacian matrix , and hence this problem 
becomes the standard spectral sparsification problem, which enjoys nearly linear time solutions 
[ST11, SS11]. However, the sparsification of L a (G) appears to be algorithmically challenging as 
the matrix power (D _1 A) r is defined by all paths of length r, whose precise calculation would 
be prohibitively expensive (due to the cost of matrix multiplication and densification in the 
matrix powers). 

In this paper, we develop the first nearly linear time algorithm for this sparsification problem: 

For any G with n vertices and m edges, d coefficients a, and e > 0, our algorithm runs in time 
Old 2 • m ■ log 2 n/e 2 ) to construct a Laplacian matrix L = D — A with 0(n\ogn/ e 2 ) non-zeros 
such that 

d 

L w e L a (G) = D “rD ■ ( D_lA ) r • 

r —1 

In the equation, L L„(G) denotes that L and L„(G) are spectrally similar within a factor 
of 1 ± e as defined in [ST11], 

Matrix polynomials arise in mathematical analysis of matrix functions as well as numerical 
solutions (such as Newton’s method) of matrix equations. Our work is particularly motivated 
by the algorithmic problems for speeding up the classic Newton’s method in applications such 
as computing the inverse square-root of the precision matrix of a Gaussian random field (in 
order to obtain i.i.d random samples of the graphic model), as well as computing the g th -root 
transition (for q > 1) in a time-reversible Markov model. The key algorithmic step for both 
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applications is the construction of a spectral sparsifier of a constant degree 1 random-walk matrix- 
polynomials introduced by Newton’s method. Our sparsification algorithm leads to a simpler 
and faster algorithm for these problems than the previous one [CCL+14] that circumvents the 
challenging problem of sparsifying high-degree random-walk matrix polynomials at the cost 
of slower convergences and complex approximation. Our algorithm can also be used to build 
efficient data structures for effective resistances for multi-step time-reversible Markov models, 
and we anticipate that it could be useful for other tasks in network analysis. 


1 Introduction 

Polynomials are used in many fields of mathematics and science for encoding equations that model 
various physical, biological and economical processes. In scientific computing and its underpinning 
numerical analysis, polynomials appear naturally in (truncated) Taylor series, the fast multipole 
method [GR87], and various numerical approximations. These computational methods are respon¬ 
sible for a large part of engineering and scientific simulations ranging from weather forecasting to 
earthquake modeling [SF73] to particle/galaxy simulation [GR87]. 

Like its scalar counterpart, matrix polynomials of the form, Ci ■ M l , arise in mathematical 
analysis of matrix functions and dynamical systems, as well as numerical solution of matrix equa¬ 
tions. One class of matrices of particular importance in network analysis is the adjacency matrix of 
a weighted, undirected graph G. We will denote these matrices using A. If we use D to denote the 
diagonal matrix containing weighted degrees of vertices, D _1 A is the transition matrix of random 
walks on the graph. Powers of this matrix correspond to multiple steps of random walks on the 
graph, and are graphs themselves. However, they are usually dense, and are cost-prohibitive both 
in time and memory to construct when the input graph G is of large-scale. Our objective is to 
efficiently construct sparse approximations of this natural family of random-walk matrices. 

1.1 Motivation from Gaussian Sampling and Fractional Markov Transition 

A problem that motivates our study of random walk polynomials is the inverse square-root problem: 
find a linear operator C such that C T C is close to the inverse of the Laplacian matrix. Laplacian 
matrices are a subclass of SDDM matrices 2 and have standard split-form representation of L = 
D — A. When we apply an extension of the classical Newton’s method to this form we can reduce 
the problem to that of factoring D — (|D • (D _ 1 A) 2 + 3 D • (D" 1 A) 3 ), which has smaller spectral 
radius, by using the matrix identity 

(D - A)-5 = (l + Id^a) (d - Qd • (D^A ) 2 + Id • (D^A) 3 ^ * . ( 1 ) 

Finding inverse square-root factorizations is a key step in sampling from Gaussian graphical 
models: Given a graphical model of a Gaussian random field specified by its precision matrix A 
and potential vector h , i.e., Pr(x|A, h) oc exp(— |x T Ax + ft, T x), efficiently generate i.i.d random 
samples from this multivariate Gaussian distributions [LW12]. If one can compute an efficient 
sparse representation of C ~ A -1 / 2 , then one can convert i.i.d. standard Gaussian random vector 

1 In numerical algorithms where random-walk matrix-polynomials arise, the degree d of the polynomials is usually 
either a constant or bounded above by a polylogarithmic function in n. 

2 SDDM matrices are positive definite symmetric diagonal dominant matrices with non-positive off-diagonal ele¬ 

ments. 
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z using x = Cz + fji (where n = A _ 1 h) to i.i.d random vectors of a Gaussian random field that 
numerically approximates the one defined by (A, h) [CCL + 14]. Furthermore, if the precision matrix 
A = (A ij) is symmetric diagonally dominant (SDD), i.e., for all i, A ? ;,j > Ylj# |Aij|, then one can 
reduce this factorization problem to the problem formulated by Equation (1) involving an SDDM 
matrix. 

Then, in order to iteratively apply Equation (1) to build an efficient representation of the inverse 
square-root factor of D — A, one needs to efficiently construct the second term in Equation (1), 

D - • (D - 1 A) 2 + Id • (D - 1 A) 3 ) . (2) 

The quadric and cubic powers in this matrix can be very dense, making exact computations involv¬ 
ing them expensive. Instead, we will directly compute an approximation of this matrix that still 
suffices for algorithmic purposes. 

Finding an inverse square-root of an SDDM matrix is a special case of the following basic 
algorithmic problem in spectral graph theory and numerical analysis [CCL + 14]: 

Given annxn SDDM matrix M, a non-zero integer q, and an approximation parameter 
e, compute an efficient sparse representation of an n x n linear operator C such that 

M 1/q ra e CC T 

where is spectral similarity between linear operators which we will define at the start of Section 2. 

The matrix q th - root computation appears in several numerical applications and particularly in 
the analysis of Markov models [HL11]. For example, in his talk for Brain Davies’ 65 Birthday 
conference (2009), Nick Higham quoted an email that he received from a power company regarding 
the usage of an electricity network to illustrate the practical needs of taking the q th - root of a Markov 
transition. 

“I have an Excel spreadsheet containing the transition matrix of how a company’s [Stan¬ 
dard & Poor’s] credit rating charges from on year to the next. I’d like to be working in 
eighths of a year, so the aim is to find the eighth root of the matrix. ” 

In our case, note that when the graph is connected, D -1 A is the transition matrix of a reversible 
Markov chain [AF02], and the first order approximation of the g^-root transition is I — (I — 
D _ 1 A) 1 / ? . Extension of Newton’s method then leads to an iterative formula similar to Equation 
(1) for finding factorization of the q th root. Thus, the key algorithmic task for obtaining a nearly 
linear time Newton(-like) algorithm for q th - root factorizations is the efficient approximation of 
matrix polynomials akin to Equation (2). 

1.2 Main Technical Contribution 

We start with a definition that captures the matrix polynomials such like Equation (2) that arise 
in the application of Newton’s or Newton-like methods to graph Laplacians. 

Definition 1.1 (Random-Walk Matrix-Polynomials). Let A and D be the adjacency matrix and 
diagonal weighted degree matrix of a weighted, undirected graph G respectively. For a non-negative 
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vector a = (aq,..., ay) with Ylr—i °V = 1 , matrix 

d 

L a (G) = D - £ a r D • (D _1 A) r (3) 

r=l 

is a d-degree random-walk matrix-polynomial of G. 

Random-walk matrix-polynomials naturally include the graph Laplacian G as the linear case: 
when d = 1, the matrix polynomial becomes L(G) = D — A, which is the Laplacian matrix of 
G. In fact, the following proposition can be established by a simple induction, which we prove in 
Appendix A. 

Proposition 1.2 (Laplacian Preservation). For any weighted, undirected graph G with adjacency 
matrix A and diagonal matrix D. for every non-negative vector a = (aq,...,ay) such that with 
X)r=i a r = 1; the random-walk matrix-polynomial ~L a (G) remains a Laplacian matrix. 

Consequently, applying spectral sparsification algorithms [ST11, SS11, BSS12] to L a (G) gives: 

Proposition 1.3 (Spectral Sparsifiers of Random-Walk Matrix Polynomials). For all G and a as 
in Proposition 1.2, for any e > 0, there exists a Laplacian matrix L = D — A with 0(n log n/e 2 ) 
non-zeros such that for all x £ ML 1 

(1 — e) • x T Lx < x T — '^2 a rD • (D _1 A) T ^ x < (1 + e) ■ x t TuX. (4) 

The Laplacian matrix L satisfying Equation (4) is called spectrally similar with approximation 
parameter e to L a (G) [ST11], The computation of a (nearly) linear size spectral sparsifier of a 
dense Laplacian matrix is a fundamental algorithmic problem in spectral graph theory that has 
been used in solving linear systems [ST14, KMP10] and combinatorial optimization [CKM + 11], The 
work of [SSI 1, ST 11 ] showed that spectral sparsifier of 0(n log n/e 2 ) non-zeros can be constructed 
in 0(m log 2 n/e 2 ) time for any n x n Laplacian matrix L with m non-zeros. A recent technique of 
[PS 14] can sparsify a degree 2 random-walk matrix-polynomial in nearly linear time. 

In this paper, we give the first nearly linear time spectral-sparsification algorithm for all random- 
walk matrix-polynomials. Our sparsification algorithm is built on the following key mathematical 
observation that might be interesting on its own: One can obtain a sharp enough upper bound on 
the effective resistances of the high-order polynomial D — D(D~ 1 A) r from the combination of the 
linear D — A and quadratic D — AD X A polynomials. 

This allows us to design an efficient path sampling algorithm that utilizes this mathematical 
observation to achieve the critical sparsification. We prove the following result which generalizes 
the works of [ST11, SS11, PS14], 

Theorem 1.4 (Random-Walk Polynomials Sparsification). For any weighted, undirected graph G 
with n vertices and m non-zeros, for every non-negative vector a = (aq, ..., ay) with Ylt=i a r = 
for any e > 0, we can construct in time 0(d 2 ■ m ■ log 2 n/e 2 ) a spectral sparsifier with 0 (nlogn/e 2 ) 
non-zeros and approximation parameter e for the random-walk matrix-polynomial Lq,(G). 

The total work of our sparsification algorithm depends quadratically in the degree of the poly¬ 
nomial, which could be expensive when d = 0(n c ) for some constant c > 0. In Section 4 we present, 
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for even degrees d, a more efficient algorithm to sparsify D — D(D 1 A) rf . We will show that, for 
any positive integer r, if we are given A such that 

D- A«D-D(D _1 A) 2r , (5) 

we can construct a sparse matrix A x and a sparse matrix A + such that 

D- A x « D-D(D _1 A) 4r and D - A+ w D - D (D' 1 A) 2r+4 . ( 6 ) 

Applying these two routines inductively gives an algorithm that, for any d divisible by 4, 
approximates the d-degree random-walk matrix-monomial in time polylogarithmic in d. We can 
also extend this algorithm to handle all the even-degree monomials. Because when d = 0(l/e), we 
can directly invoke Theorem 1.4 to sparsify D — D(D -1 A) rf , and when d = Q( 1/e), we know that 
D - D(D- 1 A) d D - D(D" 1 A) <i + 2 , therefore any even-degree monomial can be replaced by a 
degree 4r monomial, while introducing a small error only. 

Theorem 1.5 (High Degree). For any even integer d, let L Q d = D — D(D _1 A) d be the d-step 
random walk matrix, For any e > 0, we can construct a graph Laplacian Lg with 0(n logn/e 2 ) 
nonzero entries, in total work (){rn ■ log 3 n ■ log 5 (d)/e 4 ), such that ~ e L Q d . 

While we build our construction on the earlier work [ST11, SSI 1, PS14] for graph sparsification, 
we need to overcome some significant difficulties posed by high degree matrix polynomials, which 
appear be algorithmically challenging: The matrix (D _1 A) r is defined by all paths of length r, 
whose precise calculation would be prohibitively expensive due to the cost of matrix multiplication 
and densification in the matrix powers. Moreover, the algorithm of [PS14] relies an explicit clique¬ 
like representation of edges in the quadratic power and expanders, which is much more specialized. 


1.3 Some Applications 


Matrix polynomials are involved in numerical methods such as Newton’s or Newton-like methods, 
which have been widely used for finding solutions of matrix equations. Our sparsification algorithm 
can be immediately applied to speed up these numerical methods that involves SDDM matrices. 
For example, in the application of finding the inverse square root of an SDDM matrix D — A, we 
can directly sparsify the cubic matrix polynomial given in Equation 2, and iteratively approximate 
the inverse-square root factor of D — A using Equation 1. This leads to a simpler and faster 
algorithm than the one presented in [CCL + 14], which circumvents the challenging problem of 
sparsifying high-degree random-walk matrix polynomials at the cost of slower convergences and 
complex approximation. The simplicity of the new algorithm comes from the fact that we no longer 
need the Maclaurin series for conditioning the numerical iterations. The convergence analysis of the 
new algorithm follows the standard analysis of the Newton’s method, with careful adaptation to 
handle the approximation errors introduced by the spectral sparsification. The elimination of the 
Maclaurin series speeds up the previous algorithm by a factor of log log k, where k is the relative 
condition number of D — A. 

In general, our sparsification algorithm can be used inside the Newton-like method for approx¬ 
imating the inverse q th -root of SDDM matrices to obtain a simpler and faster nearly linear time 
algorithm than the one presented in [CCL + 14] for q £ Z+, with reduction formula as follows 


(I-X)~ 1/,? 




(I 




( 7 ) 
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Our mathematical and algorithmic advances enable the sparsification of the (2 q + l)-degree poly¬ 
nomials in the middle, in turn speed up the previous algorithm by a factor of log(log(«:)/e). 

By Proposition 1.2, the random-walk matrix polynomial L a (G) = D — Y2r=i ar D • (D~ 1 A) r 
defines a weighted graph G a whose adjacency matrix is Ylr=i ' (D _ 1 A) r , and overlays d 
graphs induced by the multi-step random walks. While D — A and D — AD 1 A offers a good 
enough bound on the effective resistances of edges in G a for the purpose of path sampling, these 
estimates are relatively loose comparing with the standard approximation condition. Because 
spectral similarity implies effective-resistance similarity [ST 11 , SS11], our sparsification algorithm 
together with the construction of Spielman and Srivastava [SSI 1] provide an efficient data structure 
for effective resistances in G a . After nearly linear preprocessing time, we can answer queries 
regarding approximate effective resistances in G a in logarithmic time. 

Due to these connections to widely used tools in numerical and network analysis, we anticipate 
our nearly linear time sparsification algorithm could be useful for a variety of other tasks. 

2 Background and Notation 

We assume G = ( V , E, w) is a weighted undirected graph with n = \V\ vertices, m = \E\ edges and 
edge weights w e > 0. Let A = denote the adjacency matrix of G , i.e., a,ij = w(i,j). We let 
D to denote the diagonal matrix containing weighted degrees of vertices. Note that D~ X A is the 
transition matrix of random walks on the graph and Lq- = D — A is the Laplacian matrix of G. It 
is well known that for any vector x = (aq, ... ,x n ), 

x Iiqx — ''y ( ( x u x v ) w uv (8) 

(u,v)£E 

We use G r to denote the graph introduced by r-step random walks on G. We have Lq 2 = 
D — AD~ X A, and Lc r = D — D(D _ 1 A) r in general. 

In our analysis, we will make extensive use of spectral approximations based on the Loewner 
partial ordering of positive semidehnite matrices. Given two matrices X and Y, we use Y X 
(or equivalently X ^ Y) to denote that Y — X is positive semi-definite. Approximations using 
this ordering obey usual intuitions with approximations of positive scalars. We will also use a 
compressed, symmetric notation in situations where we have mirroring upper and lower bounds. 
We say X ~ £ Y when 


exp (e) X !>= Y exp (—e) X, (9) 

We use the following standard facts about this notion of approximation. 

Fact 2.1. For positive semi-definite matrices X, Y, W and 7i, 

1. if Y Z , then X + Y ~ e X + Z : 

2. if X Y and W « e Z, then X + W Y + Z; 

3. if X ~ ei Y and Y « £2 Z, then X ~ ei +e 2 Z; 

4- if X and Y are positive definite matrices such that X ~ e Y, then X -1 « £ Y _1 ; 
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G = GraphSampling(G = {V, E,w},r e ,M) 

1. Initialize graph G = {V,0}. 

2. For i from 1 to M: 

Sample an edge e from E with p e = T e /(fff e&E r e ). Add e to G with weight w(e)/(Mr e ). 

3. Return graph G. 


Figure 1: Pseudocode for Sampling by Effective Resistances 
5. if X « e Y and V is a matrix, then V T XV ~ g V YV. 

The Laplacian matrix is closely related to electrical flow [SSI 1, CKM + 11], For an edge with 
weight w(e), we view it as a resistor with resistance r(e) = l/w(e). Recall that the effective 
resistance between two vertices u and v R(u, v ) is defined as the potential difference induced between 
them when a unit current is injected at one and extracted at the other. Let e* denote the vector 
with 1 in the z-th entry and 0 everywhere else, the effective resistance R(u, v ) equals to (e u — 
e„) T Lt(e u — e v ), where is the Moore-Penrose Pseudoinverse of L. From this expression, we can 
see that effective resistance obeys triangle inequality. Also note that adding edges to a graph does 
not increase the effective resistance between any pair of nodes. 

In [SS11], it was shown that oversampling the edges using upper bound on the effective resistance 
suffices for constructing spectral sparsifiers. The theoretical guarantees for this sampling process 
were strengthened in [KL13]. A pseudocode of this algorithm is given in Figure 1, and its guarantees 
can be stated as follows. 

Theorem 2.2. Given a weighted undirected graph G, and upper bound on its effective resistance 
Z(e) > R(e). For any approximation parameter e > 0, there exists M = 0(logn/e 2 • (Yl e £E T e))’ 
with r e = w(e)Z{e), such that with probability at least 1 — G = GraphSampling(G, w, r e , M) 
has at most M edges, and satisfies 

(1 — c)Lg ^ Lg (1 + e)L G . (10) 

The equivalence between solving linear systems in such matrices and graph Laplacians, weakly- 
SDD matrices, and M-matrices are well known [ST04, DS08, KOSZ13, PS14], 

3 Sparsification by Sampling Random Walks 

We sparsify Ec r = D — D(D _1 A) r in two steps. In the first and critical step, we obtain an initial 
sparsifier with 0{dm log n/e 2 ) non-zeros for Lc r using an upper bound estimate on the effective 
resistance of G r obtained from L q = D — A and L g 2 = D — AD -1 A. In the second step, we 
apply the standard spectral sparsification algorithms to further reduce the number of nonzeros to 
0 (nlogn/e 2 ). 

In the first step sparsification, we bound the effective resistance on G r using Lemma 3.1 (proof 
included in Appendix B), which allows us to use the resistance of any length-r path on G to upper 
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bound the effective resistance between its two endpoints on G r . Lemma 3.2 shows that we can 
sample by these estimates efficiently. 

Lemma 3.1 (Two-Step Supports). For a graph Laplacian matrix L = D — A, with diagonal matrix 
D and nonnegative off-diagonal A, for all positive odd integer r, we have 

2 Lg ^ L Gr rl>G ■ (11) 

and for all positive even integers r we have 

L g 2 ^ L Gr ^ 2 L ° 2 ' ( 12 ) 

For each length-r path p = (uq ... u r ) in G , we have a corresponding edge in G r , with weight 
proportional to the chance of this particular path showing up in the random walk. We can view 
G r as the union of these edges, i.e., L Gr (u 0 ,u r ) = 'ff P =( Uo ... Ur ) w (p)- 

We bound the effective resistance on G r in two different ways. If r is odd, G is a good approx¬ 
imation of G r , so we can obtain an upper bound using the resistance of a length-r (not necessarily 
simple) path on G. If r is even, G 2 is a good approximation of G r , in this case, we get an upper 
bound by composing the effective resistance of 2 -hop paths in different subgraphs of G 2 . 

Lemma 3.2 (Upper Bound on Effective Resistance). For a graph G with Laplacian matrix L = 
D - A, let L Gr = D - D(D _1 A) r be its r-step random-walk matrix. Then, the effective resistance 
between two vertices uq and u r on ~L Gr upper bounded by 


R Gr {u 0 ,u r ) < ^2 
1=1 


2 


(13) 


where (uo ... u r ) is a path in G. 

Proof. When r is a positive odd integer, by Lemma 3.1, we have that ILg F T Gr , which implies 
that for any edge (u, v ) in G, 

R Gr (u,v) < 2 r G (u,v) = 2 . (14) 

V) 

Because effective resistance satisfies triangular inequality, this concludes the proof for odd r. 

When r is even, by Lemma 3.1, we have that L G2 F T Gr . The effective resistance of D — AD -1 A 
is studied in [PS14] and restated in Appendix C. For the subgraph (^(u) anchored at the vertex u 
(the subgraph formed by length -2 paths where the middle node is u ), for any two of its neighbors 
Vi and V 2 , we have 


Rc r {v !,V 2 ) < R G2 (u)(v l,v 2 ) 


1 

A(n,u) 


+ 


1 

A (u,v 2 )' 


(15) 


Because we have the above upper bound for any 2-hop path (v\ ,u, v 2 ), by the triangular inequality 
of effective resistance, the lemma holds for even r as well. □ 


Now we could sparsify G r by sampling random walks according to the approximate effective 
resistance. The following mathematical identity will be crucial in our analysis. 






Lemma 3.3 (A Random-Walk Identity). Given a graph G with m edges, consider the r-step 
random-walk graph G r and the corresponding Laplacian matrix L Q r = D — D(D _1 A) r . For a 
length-r path p = (uq ■ ■ ■ u r ) on G, we have 


nkAK-i^ Z{ )= A 2 — 

nirfDIw.w) ^Aim-uu,) 

The summation of w(p)Z(p) over all length-r paths satisfies 

Y^w(p) ■ Z(p) = 2 rm. 

p 

Proof. We substitute the expression for w(p) and Z(p) in to the summation. 

(5 raX'fe&Sr) 

= 2 y- y- / n;U A( Uj -i,Uj) II)-! A( uj ,u j+1 ) \ 


p i= 1 


2 ZZ z 

e£G i =1 \p with (ui—i,Ui)=e 


n.; = i M^-i, uj) n'j—t %%+i) 


2EE e 

e€G i =1 \p with (ui—i,Ui)=e 1 


n;=i A{uj-uuj) Y\j=i A{uj,u j+1 ) 


t i i/Ji d («j» nU^( u P u i) 


eeG i =1 
= 2 mr. 


(16) 

(17) 

(18) 

(19) 

( 20 ) 

( 21 ) 

( 22 ) 


From Equation 18 to 19, instead of enumerating all paths, we first fix an edge e to be the 
i-th edge on the path, and then extend from both ends of e. From Equation 20 to 21, we sum 
over indices iteratively from uq to Uj_i, and from u r to u*. Because D (u,u) = Yl v A(u,v), this 
summation over all possible paths anchored at ('Uj_i,iq) equals to 1 . □ 


Now we show that we can perform Step (2) in GraphSampling efficiently. We take samples 
in the same way we cancel the terms in the previous proof. Recall that sampling an edge from G r 
corresponds to sampling a path of length r in G. 


Sample a path p from G with probability proportional to t p = w{p)Z(p)\ 

a. Pick an integer k € [1 : r\ and an edge e £ G, both uniformly at random. 

b. Perform {k — l)-step random walk from one end of e. 

c. Perform (r — k)- step random walk from the other end of e. 

d. Keep track of w(p) during the process, and finally add a fraction of this edge to our 
sparsifier. 


9 











Lemma 3.4. There exists an algorithm for the Step (2) in GraphSampling, such that after 
preprocessing with work 0(n), it can draw an edge e as in Step (2) with work 0(r ■ log n). 

Proof. The task is to draw a sample p = (uq ... u r ) from the multivariate distribution V 


Pr(uo • • • u r ) = 


2 _ j | Ih=i A.(uj- 1 , Uj 

2rm A(^_i,Ui) J ^ 


E 


(23) 


For any hxed k € [1 : r] and e € G, we can rewrite the distribution as 

Pr(>o ...u r ) = Pr((rt fc _i, u k ) = e) • Pr(tt 0 ... u k+1 ... u r \(u k _ l7 u k ) = e) 
= Pr((rt fc _i, u k ) = e) • Pr(tt 0 ... u k - 2 \u k -i ) • Pr(u r ... u k+ i\u k ) 


i=k—l 


(24) 


= Pi((uk-i,Uk) = e) • Pr(ui-i\ui) ■ Pr(uj|«j_i) 

1 i=k -\-1 


Note that Pr((ttfc_i, u k ) = e) = —, and Pr(ui-\\ui) = A(ui,Ui-i)/D(ui,Ui). The three terms 
in Equation 24 corresponds to Step (a)-(c) in the sampling algorithm stated above. With linear 
preprocessing time, we can draw an uniform random edge in time O(logn), and we can also simulate 
two random walks with total length r in time 0(r log n), so Step (2) in GraphSampling can be 
done within 0(r log n) time. □ 


Combining this with spectral sparsifiers gives our algorithm for efficiently sparsifying low-degree 
polynomials. 


Proof of Theorem l.f. First we show on how to sparsify a degree-d monomial hc d = D — D • 
(D -1 A) C . We use the sampling algorithm described in Theorem 2.2, together with upper bounds 
on effective resistance of Gd obtained from D — A and D — AD -1 A. The total number of samples 
requires is 0(dm log n/e 2 ). We use Lemma 3.4 to draw a single edge from G r , where we sample 
d-step random walks on D — A, so the total running time is 0(d 2 m log 2 n/e 2 ). Now that we have a 
spectral sparsifier with 0(dm log n/e 2 ) edges, we can sparsify one more time to reduce the number 
of edges to 0(n log n/e 2 ) by [ST11, SS11] in time 0(dm log 2 n/e 2 ). 

To sparsify a random-walk matrix-polynomial L a (G), we sparsify all the even/odd terms to¬ 
gether, so the upper bound of effective resistance in Lemma 3.2 still holds. To sample an edge, 
we first decide the length r of the path, according to the probability distribution Pr(length = 
r\r is odd/even) oc a r . □ 


4 Sparsification of Higher Degree Matrix-Monomials 

We now put together the components of the algorithm for sparsifying higher degree monomials. 
Given a positive even integer 2r and A such that D — A D — D(D~ 1 A) 2r , we will show that 
we can efficiently compute 

1. A x such that D — A x ~ £ / D — AD _1 A L c 4r in Subsection 4.1, and 

2. A_|_ such that D — A + D — (AD _1 ) 2 A(D _1 A) 2 L G 2r+4 i n Subsection 4.2. 

Putting these together then gives our main theorem about sparsifying high degree monomials from 
Theorem 1.5. 
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4.1 Constructing A x 

We construct A x by sparsifying D — AD -1 A with the algorithm described in Section 3. The 
remaining is to show that spectral approximation holds under squaring, i.e., 

D- A»v D-D(D -1 A) 2r (25) 

=>D- AD _1 Aw e / D-D(D _1 A) 4r , (26) 

which directly follows Lemma 4.3 and Lemma 4.4. The result is summarized in the following lemma. 
Lemma 4.1. Let the graph Laplacian Lq = D — A and L^ 2 = D — A for r £ Z+, such that (1) 
D — A ~ e / D — D (D _1 A) 2r , and (2) A contains m nonzero entries, then for any e > 0, we can 
construct in total work 0(m log 3 n/e 4 ), a graph Laplacian = D — A x with A x containing at 
most 0(n log n/e 2 ) nonzero entries, such that 

D - A x D-D (D _1 A) 4r . (27) 

First, we will start with the fact that D — AD A is the Schur complement of the matrix 

D -A 

-A D J ’ 

onto the second half of the vertices. 

This is to allow the use of the following Lemma regarding Schur complement: 

Fact 4.2 (Lemma B.l. from [MP13]). Suppose M and M are positive semi-definite matrices 
satisfying M M, then their Schur complements on the same set of vertices also satisfy M sc h ur 

-M schur ■ 

We also need the following facts about even random walks. 

Lemma 4.3. If 0 =<; A and 

(1 —e)(D —A) ^D — A^! (1 + e) (D — A), (28) 

then 

(1 -e)(D + A) ^ D + A ^ (l + e)(D + A). (29) 

Proof. Rearranging the leftmost condition in Equation 28 gives 

A ^ eD + (1 — e) A (30) 

Adding D to both sides then gives 

D + A ^ (1 + e) D T (1 — c) A. (31) 

Combining with 0 ^ A gives 

D + A ^ (1 + e) (D + A) . (32) 

Similarly, we can prove 

(1 —e)(D + A) ^ D + A. 
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(33) 

□ 




(34) 


Lemma 4.4. If D — A D — A and D + A ~ c D + A, then 

D - AD -1 A » e D — AD X A. 

Proof. Because of Lemma 4.2, it suffices to show that 


Consider a test vector 


D 

-A 


D 

-A 

-A 

D 


-A 

D 


(35) 


x 

y 


we have 

|> T v T } 


D 

-A 


-A 

D 



X 

y 

l 

' 2 


(x + y) ' (D — A) (x + y) ' + (x - y) T (D + A) (x - y ) 


\T 


(36) 


which leads to Equation 35 , since we have D — A D — A and D + A D + A. 


□ 


4.2 Constructing A + 

We can then extend this squaring routine by composing its walk with smaller random walks on each 
side. That is, we sparsify D — (AD" 1 )" A (D" X A) 2 with the support of D — AD" 1 A and D — A. 
First, we need to prove that the symmetric composition preserve the spectral approximation. 

Lemma 4.5. If D — A D — A, then for any symmetric A such that 0 ^ A ^ D, we have 

D - AD" 1 AD" 1 A D - AD _1 AD _1 A. 

Proof. Since AD" 1 = (D _1 A) T , we can compose the given identity with D 1 A using Fact 2.1.5 
to give: 

(ad" 1 ) (d - A) (d _1 a) » e (ad" 1 ) (d - a) , 

or if expanded: 

AD _1 A - AD" 1 AD" 1 A AD X A - AD" 1 AD" 1 A 

Also, since D — A )p 0, we have 

D — AD" ! A > 0, 

which allows us to write an approximation of the form 

D - AD" X A D - AD" 1 !, 

and add it to the approximation above using Fact 2.1.2. This turns the AD _1 A terms into 
terms, and completes the proof. 

Now we prove that, with the support of D — AD" X A and D — A, there exists an efficient 
algorithms to conduct the edge sampling, which leads to the efficient sparsification. 
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Q □ 














Lemma 4.6. For graph Laplacian = D — A and Lg = D — A for r € Z+, such that (1) 
D — A D — D (D _ 1 A) 2r ; and (2) A and A eac/i contains at most m nonzero entries, then for 
any e > 0, we can construct in total work 0(m log 3 n/e 4 ), a graph Laplacian L^ + = D — A + with 
A + containing at most 0{n log n/e 2 ) nonzero entries, such that 

D-A + « e / +e D-D(D^ 1 A) 2r+4 . (37) 


Proof. By Lemma 4.5, we have 

D - (AD ” 1 ) 2 A (D _1 A ) 2 » e D - (AD -1 ) 2 D (D^A) 2 "' (D _ 1 A ) 2 = D - D (D“ 1 A ) 2r+4 . (38) 

Next step is to support D — (AD" 1 ) - A (D^A ) 2 with D — A and D — AD _ 1 A. 

First, we have 


exp (-e) (D - AD ' 1 A) =$ exp (-e) (d — D (D _ 1 A) 2r+4 ^ ^ D - (AD ” 1 ) 2 A (D^A ) 2 (39) 

We can also show that 

exp (—2e) (d - A) ^ exp (-e) (D - D (D _ 1 A) 2r ) (40) 

^ exp (-e) ^D — D (D _ 1 A) 2r+4 ^ ^ D - (AD - 1 ) 2 A (D^A ) 2 . (41) 

To sample (AD -1 )” A (D _ 1 A ) 2 efficiently, let’s consider the length-5 path 

p = (U 0 ,U 1 ,U 2 ,U 3 ,U4,U 5 ) , 

where (uq,ui), (ui,u 2 ), (w 3 ,u 4 ), (^ 4 ,^ 5 ) are edges in A and ( u 2 ,u 3 ) in A. 

The edge corresponds to p has weight w(p) as 


w(p) 


A (n 0 , ni) A (ni, n 2 ) A (u 2 , u 3 ) A (n 3 , n 4 ) A (n 4 , n 5 ) 
D (n 4 , n 4 ) D (u 2 ,u 2 ) D (u 3 , u 3 ) D (n 4 , n 4 ) 


And it has upper bound on effective resistance Z(p) as 


Zip) 


exp(e) exp(e) exp( 2 e) exp(e) exp(e) 

A (n 0 , wi) A (n 4 ,n 2 ) A(n 2 ,n 3 ) A(n 3 ,n 4 ) A(n 4 ,n 5 ) 


(42) 


(43) 


The sampling algorithm is similar to that described in Lemma 3.4, with the middle edge random- 
walk replaced with the random walk on graph A. Also, the edge index in the path is not uniform 
among 1, 2,... , 5 as in Lemma 3.4, i.e., it is now proportional to number of edges in the correspond¬ 
ing random-walk and the additional exp(e) terms occurred from the chain of spectral approxima¬ 
tion. However, the total work for path sampling remains the same for e = 0(1). After the initial 
sparsification by path sampling, can we further sparsify the graph with spectral sparsifiers. □ 
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4.3 Combining A + and A x 

Combining these two components allows us to prove our main result on sparsifying high degree 
monomials. 

Proof of Theorem 1.5. When d is divisible by 4, we start with A such that D —A ~ e / D-AD _1 A, 
and invoke Lemma 4.1 and Lemma 4.6 k = O(logd) times in total, to reach an approximation for 
D-D (D -1 A) C 1 . Moreover, if we invoke Lemma 4.1 and Lemma 4.6 with approximation parameter 
e/(2 k), and apply [ST11] with e /2 on the final output to obtain G, the total work is bounded by 
0(m log 3 n log 5 (d)/e 4 ), and G satisfies L ( - ; Lc d . 

When d is equal to 4r + 2 for some integer r, we first check if d < 4/e, in which case we can 
directly use Theorem 1.4 to sparsify L< 3 d . When d > 4/e, we can produce a sparsifier for degree 4 r 
monomial with error e/2, L^, ~ e / 2 Lg 4t . and use it directly. This is because 

(1 - A 4r ) < 1 - A 4r+2 < (1 + ^)(1 - A 4r ) < (1 + |)(1 - A 4r ), VA e (-1,1) and integer r. (44) 

So in this situation, we know that L c 4r ~ e /2 L o d ■ Combining the two spectral approximation using 
Fact 2.1.3 gives L^ ~ e L(j d . □ 

Note that Gj with even d is usually enough the study the long term effect of the random-walk 3 . 
However, it is an intriguing question both mathematically and algorithmically if one can sparsify 
Gd for all d with work polylogarithmic in d. 

5 Extension to SDDM Matrices 

Our path sampling algorithm from Section 3 can be generalized to a SDDM matrix with splitting 
D — A. The idea is to split out the extra diagonal entries to reduce it back to the Laplacian 
case. Of course, the D 1 in the middle of the monomial is changed, however, it only decreases the 
true effective resistance so the upper bound in Lemma 3.2 still holds without change. The main 
difference is that we need to put back the extra diagonal entries, which is done by multiplying an 
all 1 vector through D — D(D~ 1 A) r . 

The follow Lemma can be proved similar to Lemma 1.2. 

Lemma 5.1 (SDDM Preservation). If M = D — A is an SDDM matrix with diagonal matrix 
D and nonnegative off-diagonal A, for any nonnegative ex = (aq,..., af) with Ylr=i cty = 1, 
M a = D — Er=i CL-D(D _1 A) r is also an SDDM matrix. 

Our algorithm is a direction modification of the algorithm from Section 3. To analyze it, we 
need a variant of Lemma 3.2 that bounds errors w.r.t. the matrix by which we measure effective 
resistances. We use the following statement from [Penl3]. 

Lemma 5.2 (Lemma B.0.1. from [Penl3]). Let A = y^Ue and B benxn positive semi-definite 
matrices such that the image space of A is contained in the image space of B. and r be a set of 
estimates such that 

r e >t/^B t y e Ve. 

3 The obvious exception is that when the random-walk is periodic 
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Then for any error e and any failure probability 5 = n~ d , there exists a constant c s such that if we 
construct A using the sampling process from Figure 1, with probability at least 1 — 6 = 1 — n~ d , A 
satisfies: 

A - e (A + B) ^ A ^ A + e (A + B). 

Theorem 5.3. Let M = D — A be an SDDM matrix with diagonal D ; nonnegative off-diagonal 
A with m nonzero entries, for any nonnegative a. = (ai, ..., a^) with Ylr=i a r = k we can 
define = D — l a r D(D- 1 A) r . For any approximation parameter e > 0, we can construct 
an SDDM matrix M with 0(nlogn/e 2 ) nonzero entries, in time 0(m ■ log 2 n • d 2 /e 2 ), such that 
M ss e M a . 


Proof. We look at each monomial separately. First, by Lemma 5.1, M r is an SDDM matrix. It 
can be decomposed as the sum of two matrices, a Laplacian matrix L r = D r — D(D~ 1 A) r , and the 
remaining diagonal D ex tra- As in the Laplacian case, a length-r paths in D — A corresponds to an 
edge in L r . We apply Lemma 5.2 to M r and L r = JfeGP yeVe j where P is the set of all length-r 
paths in D — A, and y e is the column of the incidence matrix associated with e. 

When r is an odd integer, we have 


yj ( M r) 1 Ve < 2 yj (D - A) 1 y e , 
and when r is an even integer, we have 

yj (M r )-‘ y e < 2 yj (D - AD _1 A) _1 y e . 

Let e denote the edge corresponds to the length-r path (uq, ..., u r ), the weight of e is 


w(e) = w(u 0 ...u r ) = yjy e = 


nr=i a^i,^) < n: = iA(u,_i,u 


n:=i d (ui, in) it = i ' d g (ui,ui) ’ 


(45) 


(46) 


(47) 


where D y (u,u) = ^ u A(m,«), so we have the same upper bound as the Laplacian case, and we 
can sample random walks in the exact same distribution. By Lemma 5.2 there exists M = 0(r ■ m ■ 
log n/e 2 ) such that with probability at least 1— the sampled graph G = GraphSampling(GV, r e , M) 
satisfies 

M r — — e(L r + M r ) ^ ~Lq + D ex tra M r + 2 e ^ r + Air)- (48) 


Now if we set M = L^ + D cx t ra , we will have 

(1 - e)M r M (1 + e)M r . 


(49) 


Note that D ex tra can be computed efficiently by computing diag(M, l) via matrix-vector multipli¬ 
cations. □ 


6 Remarks 

We gave nearly-linear time algorithms for generating sparse approximations of several classes of 
random-walk matrix-polynomials. As our study of this problem is motivated by the low degree case 
such as for speeding up numerical methods for solving matrix equations, our results only gives part 
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of the picture for the high degree case: we are only able to sparsify even-degree monomials , and this 
routine calls spectral sparsifiers with error of e = 1/ log d at each step. Obtaining better algorithms 
for approximating the structures of long-range random-walk matrices is an intriguing mathematical 
and algorithmic question, partially due to the non-commutativity of matrix products. Extending 
our algorithm to any d, and allowing for higher error tolerances at each step are natural directions 
for future work. Furthermore, we conjecture that any degree n random-walk matrix-polynomial 
can be sparsihed in nearly-linear time. 

Our algorithms for the low degree case is based on path sampling. This routine has analogs 
in widely used combinatorial network analysis routines such as distance estimation [KTF09] and 
subgraph counting [JG05]. We believe further investigating this connection will lead to improved 
algorithms, as well as models that better explain the effectiveness of many existing ones. 
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A Laplacian Preservation 

Proposition 1.2 (Laplacian Preservation). For any weighted, undirected graph G with adjacency 
matrix A and diagonal matrix D. for every non-negative vector a = (a\,...,otd) such that with 
Ylr=i a r = 1; the random-walk matrix-polynomial L a (G) remains a Laplacian matrix. 
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Proof. First note that L a (G) = Ylt=i a r L)(D _1 A) r is symmetric and has non-positive off-diagonals, 
so to prove that L a (G) is also a Laplacian matrix, we only need to show the off-diagonals sum to 
the diagonal. Fix an integer r and a row index i, we study the z-th row sum S r of D(D - 1 A) r . 

For r = 1, we have that the row sum Si of z-tli row of A gives Si = A i,j = Di,i- We show 
that the row sum S r+ i can be reduce to S r as follows, 


S r+1 = V | (D(D -1 A) r ) i . D*‘ ■■£ A kJ I = V (DID-'A)^ = S r 

3 / k 


(50) 


By induction, we have that S n = • • • = Si = D^j. Thus, the z-th row sum of Lq,(G) 


2(L a (G)) ij = Ja r S r =D i , i 

7 r =1 


(51) 


Therefore, L a (G) is a Laplacian matrix. 


□ 


B Support from Linear and Quadratic Terms 

Lemma 3.1 (Two-Step Supports). For a graph Laplacian matrix L = D —A, with diagonal matrix 


D and nonnegative off-diagonal A, for all positive odd integer r, we have 

-Lg f L G r r< rL G . (11) 

and for all positive even integers r we have 

L g 2 ^ Lg. ^ 2 Lg2 ' 

Proof. Let X = D~z AD ' 2 , for any integer r, the statements are equivalent to 

^ (I — X) ^ I — X 2r+1 ^ (2 r + 1) (I - X) (52) 

I - X 2 ^ I - X 2r r< r (I - X 2 ) . (53) 

Because X can be diagonalized by unitary matrix U as A = UXU T , where A = diag(Ai, A 2 ,..., A n ) 


and A* € [—1,1] for all i. Therefore we can reduce the inequalities to the scalar case, and we conclude 
the proof with the following inequalities: 

-(1 — A) < 1 — A 2r+1 < (2r + 1) (1 — A), VA 6 (—1,1) and odd integer r; 

2 (54) 

(1 — A 2 ) < 1 — X 2r <r( 1 — A 2 ), VA 6 (—1,1) and even integer r. 

□ 
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C Effective Resistance on Rank One Graph 


Proposition C.l (Claim 6.3. from [PS 14]). Given a graph of size n with the Laplacian matrix 
L = D — 2 aaT ’ w ^ ere Dj,j = {&is)/d with s = EILi a *- effective resistance for edge ( i,j) is 


-(- + -) 

s at aj 


(55) 


Proof. Let e, denote the vector where the i- th entry is 1, and 0 everywhere else. We have 


dh 




(s af)ei 


^ ] Qfcfifc (s eij ) c.j + ^ ^ Qfcfifc — ■s(6j 


e j)- 


Therefore 


( e * - e j ) T L t (e i - ej ) = -(a - e j ) T (- i - - 1 ) 


-(- + -) 
S Qi Qj 


(56) 


(57) 

□ 
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